Pronunciation Modelling of Foreign Words for Sepedi ASR

نویسندگان

  • Thipe Modipa
  • Marelie H. Davel
چکیده

This study focuses on the effective pronunciation modelling of words from different languages encountered during the development of a Sepedi automatic speech recognition (ASR) system. While the speech corpus used for training the ASR system consists mostly of Sepedi utterances, many words from English (and other South African languages) are embedded within the Sepedi sentences. In order to model these words effectively, different approaches to pronunciation dictionary development are investigated, specifically: (1) using language-specific letter-tosound rules to predict the pronunciation of each word (based on the language of the word) and mapping foreign phonemes to Sepedi phonemes using linguistically motivated mappings, (2) experimenting with data-driven foreign-to-Sepedi phoneme mappings, and (3) using Sepedi letter-to-sound rules to predict the pronunciation of all words irrespective of language. We find that the data-driven phoneme mappings are more accurate than the initial linguistically motivated mappings evaluated, and (with a slight margin) obtain our best result using Sepedi letter-tosound rules across all words in the speech corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised topic adaptation for morph-based speech recognition

Topic adaptation in automatic speech recognition (ASR) refers to the adaptation of language model and vocabulary for improved recognition of in-domain speech data. In this work we implement unsupervised topic adaptation for morph-based ASR, to improve recognition of foreign entity names. Based on first-pass ASR hypothesis similar texts are selected from a collection of articles, which are used ...

متن کامل

Speech is like a box of

Pronunciation variability is present in both native and foreign words. Since pronunciation variability constitutes a problem for automatic speech recognition (ASR) systems, modeling pronunciation variation for ASR has been the topic of various studies. In most studies, modeling pronunciation variation was attempted within the standard framework used in mainstream ASR systems. Given that some as...

متن کامل

Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transfer

One of the challenges in automatic speech recognition is foreign words recognition. It is observed that a speaker’s pronunciation of a foreign word is influenced by his native language knowledge, and such phenomenon is known as the effect of language transfer. This paper focuses on examining the phonetic effect of language transfer in automatic speech recognition. A set of lexical rules is prop...

متن کامل

Language identification of individual words with joint sequence models

Within a multilingual automatic speech recognition (ASR) system, knowledge of the language of origin of unknown words can improve pronunciation modelling accuracy. This is of particular importance for ASR systems required to deal with codeswitched speech or proper names of foreign origin. For words that occur in the language model, but do not occur in the pronunciation lexicon, text-based langu...

متن کامل

SIAK - A Game for Foreign Language Pronunciation Learning

We introduce a digital game for children’s foreign-language learning that uses automatic speech recognition (ASR) for evaluating children’s utterances. Our first prototype focuses on the learning of English words and their pronunciation. The game connects to a network server, which handles the recognition and pronunciation grading of children’s foreign-language speech. The server is reusable fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010